arXiv:1506.02900vl [math.NA] 9 Jun 2015 


A variable metric forward-backward method with extrapolation* 


S. Bonettini, F. Porta, V. Ruggiero 
June 10, 2015 


Abstract 

Forward-backward methods are a very useful tool for the minimization of a functional 
given by the sum of a differentiable term and a nondifferentiable one and their investigation 
has experienced several efforts from many researchers in the last decade. In this paper 
we focus on the convex case and, inspired by recent approaches for accelerating first-order 
iterative schemes, we develop a scaled inertial forward-backward algorithm which is based 
on a metric changing at each iteration and on a suitable extrapolation step. Unlike standard 
forward-backward methods with extrapolation, our scheme is able to handle functions whose 
domain is not the entire space. Both an convergence rate estimate on the objective 

function values and the convergence of the sequence of the iterates are proved. Numerical 
experiments on several test problems arising from image processing, compressed sensing and 
statistical inference show the effectiveness of the proposed method in comparison to well 
performing state-of-the-art algorithms. 


1 Introduction 


In this paper we are interested in solving the optimization problem 

min F{x) = f{x) + g{x) (1) 

where / and g are proper, convex and lower semicontinuous functions from M"" to M U {00}. 
Moreover, we assume that / is differentiable with Lipschitz continuous gradient on a suitable 
closed, convex set Y C dom(/) = {x G M"’ : f{x) < 00}, such that 

dom(/) 3 V 2 dom(5() 

We also suppose that g is bounded from below over its domain and problem ([I]) admits at least 
a solution. Formulation ([T|) includes also constrained problems over a closed convex set R C M” 
where / has Lipschitz continuous gradient: indeed, the constraints defined by R can be inserted 
into the model by adding to g the indicator function of the feasible set itself, i.e. 

min/(x) -I- g{x) = min /(x) -|- g{x) + in{x) 

where 

, . f 0 if X G R 
L^[x) = < ,1 . 

I -boo otherwise 


‘This work has been partially supported by MIUR under the project FIRB - Futuro in Ricerca 2012, contract 
RBFR12M3AC. The Italian GNCS - INdAM is also acknowledged. 
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Problem ([T|) is relevant in various domains of applied science such as signal and image processing, 
statistical inference and machine learning. A typical feature of these applications is the large 
number of variables, which makes the class of first order methods very attractive. In this class, 
forward-backward methods | 12 t fTH] are especially suited for problem ([T]) , since they exploit the 
decomposition of the objective function in a differentiable term and a nondifferentiable one. The 
general forward-backward iteration is given by 

^(fc+i) ^ ^{k) ^ Afc(prox„^^(x('=) - , 

where Xk, otk are positive parameters controlling the steplength and prox 0 (-) is the proximity 
operator associated to the convex function </>, defined as 

prox 0 (y) = argmin (^{x) + ^\\x - y\\‘^ 

Forward-backward methods are easy to implement and have well studied convergence properties. 
On the other hand, it is well known that they can exhibit a poor convergence rate, especially 
when a high accuracy is required. 

In the recent literature, we can find two different approaches aiming to improve the conver¬ 
gence speed of forward-backward methods. They are both described below. 

Inertial/estrapolation techniques. This approach consists in adding an extrapolation step 
to the basic forward-backward iteration, yielding a multistep algorithm, called also heavy ball or 
inertial method |26l p.65]. The idea of inertial methods became very popular in the last decade, 
in view of Nesterov’s work |24] and it has been further developed in [3], where the authors 
propose the following variant 

= prox„^g(y(*’)-afcV/(y(''))). (3) 

In [3l[6], the convergence of method ([2])-([3|) is investigated by showing that for suitable sequences 
of parameters {uk} and {j3k} (with limfc (3k = 1) one has F{x^^'>) — F* = O (^), where F* is the 
optimal value of the objective function. Recently, under additional assumptions on the sequences 
{ak} and {(3k}, Chambolle and Dossal in [TO] proved the convergence of the iterates while 

in |29| the authors propose a variant of the inertial scheme where the proximal point ([3|) can 
be computed inexactly. We also mention the recent work [25] where inertial forward-backward 
algorithms are analyzed when f{x) is not convex. 

A drawback in the use of method (l2|)-([3|) is that it may be unfeasible when dom(/) does 
not coincide with the whole space R"", since the point computed in ([ 2 |) does not necessarily 
belong to dom(/). 

Variable metric/scaling techniques. In a variable metric forward-backward algorithm, the 
underlaying metric may change at each iteration by means of suitable symmetric positive dehnite 
scaling matrices multiplying the gradient of / and also involved in the definition of the proximity 
operator. The expected advantage in using a variable metric is an improved capability to capture 
the local features of problem ([T|), possibly leading to an improvement of the convergence speed 
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(think for example to the Newton’s method). In [131114j . the authors propose and analyze the 
following variable metric forward-backward algorithm 

^(fc+i) ^ ^{k) ^ Afc(proxf^‘^^(x(^) - akD-^Vfix^’^^)) - (4) 

where {Dk} is a user supplied sequence of symmetric positive definite matrices and prox^^''g(y) 
is defined as ^ 

WOXaiiqiy) = argmin g{x) + -—{x - y)^Dk{x - y) (5) 

a;GR" ^Otk 

The authors also devise conditions on the sequence {D^} ensuring the convergence of 
Method (HI), equipped by an Armijo line-search for the computation of A^, has been extensively 
studied in the papers for constrained minimization, when g{x) reduces to the indicator 

function of a closed convex subset of R"'. The convergence rate on the objective function values in 
this case is only linear, i.e. F{x^^'^) — F* = O (^) (see mm- However, in spite of the theoretical 
convergence rate, a suitable combination of the stepsize parameter ak and the scaling matrix 
Dk can allow method (Hj) to reach practical performances which are comparable with those of 

(H-daD [HIST]. 

Main contribution. In this paper we propose an algorithm combining the two acceleration 
techniques described above, designing an original variable metric forward-backward method with 
extrapolation. 

In particular, we address the case where dom(/) is a proper subset of R”’ and we devise 
suitable conditions on the stepsize parameters and on the scaling matrices sequence to ensure 
both the convergence of the iterates sequence {x^^^ and the O (^) rate for the objective 
function values. 

The effectiveness of the proposed method is evaluated by means of a comparison with other 
state-of-the-art algorithms, on several optimization problems of the form ([T]), arising from differ¬ 
ent real-life applications such as image deblurring, compressed sensing and probability density 
estimation. 

The plan of the paper is the following. In Section [2] we collect some definitions and intro¬ 
ductory results. Section [3] is devoted to the description of the proposed algorithm while the 
convergence rate analysis is performed in Section 13.11 and the convergence of the iterates to a 
minimizer of the optimization problem is proved in Section 13.21 This last section is strongly 
inspired from a recent paper by Chambolle and Dossal [TO]. Section 0] deals with the results 
of the numerical experiments we performed on some test problems arising in image and signal 
processing and statistical inference. Finally, our conclusions are given in Section [5l 

2 Notation, definitions and basic results 

We denote by || • || the Euclidean norm of a vector while \\-\\d indicates the norm induced by the 
symmetric positive definite matrix D, i.e. ||x||^ = x"^Dx. Furthermore, in the subspace iSn(R) 
of the symmetric real matrices of order n, we consider the following Loewner partial ordering 

\/Di,D2 e 5n(R) DiF D 2 x^Dix > x^D 2 X Vx € R” 

For any r/ G R, r/ > 0 we define the set C 5n(R) as the set of all positive definite matrices 
D such that D F gF Clearly, if D G the eigenvalues of D are lower bounded by g and for 
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each u G R”, the following inequality holds 

T/||ti|p < u^Du = \\u\\\, (6) 

The following lemma states a well known property of the projection operator, whose proof runs 
exactly as in 1201 p.48] 

Lemma 2.1 Let 12 C M"' he a closed eonvex set and define the scaled Euclidean projection oper¬ 
ator associated to D as 

Pq,d{x) = argmin \\y - x\\% (7) 

y&n 

for any x G M"'. Then, the operator m is nonexpansive with respect to the norm induced by the 
matrix D, i.e. 

\\Pn,D{x) - Pn,D{z)\\D < ||a: - z\\d (8) 

for all X, 2: G M"'. 

For every x gY and y G M"' we define 

(.{y-x) = fix) + V/(x)'^(y - x) (9) 

and 

q{y; x) = £{y, x) + g{y) (10) 


Definition 2.1 ^ smooth funetion f : M” —)■ MU {00} has L-Lipschitz continuous gradient on 
the set D C domif) if there exists L > 0 such that 

||V/(x) - V/(y)|| < L||x - y\\, Vx,y G D. 

We recall also the following result for smooth functions with L-Lipschitz continuous gradient. 


Lemma 2.2 Let f : M” —>■ M U {00} be a eontinuously differentiable funetion with L-Lipsehitz 
eontinuous gradient on Y CM” and D gV^. Then, for every x,y gY we have 

fiy) <£iy;x) + ^\\x-y\\'jj ( 11 ) 

for all a <g/L. 

Proof. From ([6|) we obtain 

^iy,x) + ^\\x-y\\l, > ^{y^x) + ^\\x-yf > /(y) 


where the rightmost inequality holds for ^ < 1/L, thanks to Lemma 6.9.1 in [6] (see also [3l 
Lemma 2.1]). □ 
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Definition 2.2 Let g : M”' — )• MU{oo} be a convex function. Then, the subdifferential of g at 
X G ffi” is the set 


dg{x) = {w; G M" : g{y) > g{x) + [y - xf'w, Vy G M"} 


A point X is a minimizer of g if and only if 0 G dg{x). 

As a consequence of the previous definition, we have that x G M”' is a solution of problem ([T]) if 
and only if —Vf{x) G dg{x). 

Given a positive number a and a matrix D G we now define the following function 


Qa,D{y]x) 


q{y,x) + ^W^-yWi) 


( 12 ) 


for y G M”, x gY. The function Qa,Di'',x) admits a unique minimizer, which will be denoted 
by 

Pa,D (x) = argmin Qa,D (y;x) (13) 

yGR" 

Clearly, the point Pa,D{x) belongs to dom(y) and, when p^^oix) = x, x is a minimizer of F. A 
simple computation shows that 


Pa,D{x) 


argminy(y) + — Ily-x + aD ^Vf{x)\\t 
yGK" 2a 


(14) 


where it is more evident that the introduction of a matrix D in (1121) induces a scaling of f{y) 
by D~^. Clearly, according to ([5]), we have the equivalence pa,D{x) = prox^g(x — aZl~^V/(x)). 
In the next lemmas, two useful properties of the operator (I13p are proved. 


Lemma 2.3 Let f :Y be a continuously differentiable function with L-Lipschitz continuous 
gradient, D gV^j and g be a convex function. Let x gY and y = Pa,D{x). Then, for any z G K"", 
we have ^ ^ ^ 

Q{y,x) + —\\x - yfjj < q{z;x) + —\\z - x|||, - —\\z - y\\l (15) 

Proof. From the optimality conditions of the problem (1131) . we have that there exists a vector 
w G dg{y) such that 

V/(x) + —D{y — x) + re = 0 (16) 

a 

Since w G dg{y), for all z G M"" we have that g{z) — g{y) > uF{z — y), which, together with ([TH]) . 
implies 

g{z)-g{y)>[vf{x) + ^D{y-x'^ {y - z) (17) 
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From the definition of i{y',x) in ([9]), we can write 


+ ^\\y - x\\d = f{x) + Vf{xy{y-z + z-x) + —\\y-z + z-x\\D 

= eiz;x) + ^\\z-x\\l) + ^\\y-z\\% + 

+Vf{x)'^{y - z) + -{z - xf'D{y - z) 

a 

= yz]x) + ^\\z-xfD-^\\y-zfD + 

+ (vf{x) + ^D{y - x)^ (y-z) 

< i{z]x) + - yio + 9{z) - g{y) 


where the third equality is obtained by adding and subtracting ^y^D{y — z) and the final 
inequality is a consequence of (fT71) . Finally, inequality (fTSll follows by rearranging terms and 
recalling the definition (jlOp . □ 


A direct consequence of the previous lemma is the following result. 


Lemma 2.4 Let / : y —)• M 6e a convex, continuously differentiable function with L-Lipschitz 
continuous gradient and g be a convex function. Let F{x) = f{x) + g{x), D G Vyj and y = 
Pa,D{x), X G Y. If a is such that y satisfies the condition (ttH), then, for any z G dom{f), we 
have 

F{v) + ^ 11 ^ “ y\\^D < + ^\\z - xfjj ( 18 ) 

Proof. Since / is convex, the following inequality holds: 

F{z) > q{z; x) Vx, z 

Therefore, from Lemma 12.31 and from (|lip we have the result. □ 


3 A variable metric inertial forward-backward method with back¬ 
tracking 

In this section we describe and analyze the proposed method, whose generic scheme is detailed 
in Algorithm [TJ It consists in a variable metric forward-backward iteration (Step 4) combined 
with an extrapolation-projection step (Step 1). 

The steplength is adaptively computed via a backtracking procedure, while suitable 
choices for the extrapolation parameter fik and the scaling matrix Dk will be described dur¬ 
ing the convergence analysis in sections 13.11 and 13.21 

Algorithm [T] can be considered a generalization of the Fast Iterative Soft Tresholding Algo¬ 
rithm (FISTA, [3]), whose iteration has the form (l2])-([3|). The main novelties we introduce with 
respect to FISTA are the possibility to employ at each iteration the variable metric induced 
by the matrix Dj~ and the projection of the extrapolated point which 
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Algorithm 1 Scaled inertial forward-backward method with backtracking 

Choose oo > 0, (5 < 1, € Y. Set and define a sequence of nonnegative numbers 

{/3k} and a sequence of matrices {Dk}, with Dk S Vyj, such that 7 = sup^gf^||ii)fc|| < 00 . 

For A; = 0,1,2,... 

Step 1. Extrapolation: + I 3 k { x ^^'^ — 

Step 2. Set = Ofc-i, ifc = 0 

Step 3. Set x^^ = PakPkiv ^’^'') 

Step 4. If 

go to Step 5. 
else set 

4^4 + 1 Oik = 

and go to Step 3. 

Step 5. Set the new iterate = x^^ 

End 


allows to handle problems where dom(/) D Y does not coincide with the entire space M”. When 
y = M", EISTA is recovered by setting Dk = / for all fe > 0. 

Eor convenience, we restate below our hypotheses on problem ([T]), which we assume to be 
fulfilled throughout this section. 


(Al) f,g : M"" —)• M U { 00 } are proper, convex and lower semicontinuous; 

(A2) / is differentiable with L-Lipschitz continuous gradient on y C dom(/), Y is closed and 
convex and dom(/) ^YD dom( 5 r); 

(A3) g is bounded from below over its domain; 

(A4) problem ([T]) admits at least a solution. 

Moreover, we will indicate hereafter by {x^^^} the sequence generated by Algorithm [H while x* 
will denote any of the solutions o ©• 

Eirst, we observe that, thanks to assumption (A 2 ), Algorithm [ 1 ] is well defined, i.e. the 
backtracking loop between Steps 3 and 4 terminates in a finite number of steps. Indeed, from 
Lemma l2.21 observing that the sequence {ok} is non-increasing and that the reducing factor is 
(5 < 1 , we obtain the following inequalities 

0 < Y < Ok-i < ao (19) 

In particular, the backtracking condition implies that the new iterate satisfies 


\\yW- 

Zuk 




( 20 ) 
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In the following sections we will show that Algorithm [T] with a proper parameters setting 
has a 0{l/k‘^) convergence rate with respect to the objective function values, i.e. 

which is the same as FISTA, and, moreover, the iterate sequence converges to a minimizer 

of ([H). 


3.1 Convergence rate analysis 

In the rest of the paper we will assume that the extrapolation parameter /3fc has the form 


h 


- Ok-i) 


Ok-i 


( 21 ) 


where {9^} C (0,1] is a given sequence of parameters. Moreover, we will adopt the following 
notation 


Vk = - F{x*) 

ziF = a;(fc) + iz_^(a;(fc) _a;(fc-i)) = a;(fe-i) + ^(a;(fc) _a;(fc-i)) (22) 

9k-i 9k-i 

u(k) = ^(fc) _ a-* 

1 


Before giving the main result, we need to prove some technical lemmas. The first of them 
establishes a key inequality which is crucial for the subsequent analysis and it is analogous to 
Lemma 4.1 in [3]. 


Lemma 3.1 Let {Dk} C be a sequenee of scaling matrices and assume that {9k} satisfies 

^ 0 < < 1 ( 23 ) 

Then, we have 


2afc+itfcr’fc+i + < 2akt\_iVk + ||dj. 


(24) 


Proof. Let us define the point y* = {1 — 9k)x^^^ + 9kX*. We have y* G dom(g'). Prom (fTSli in 
Lemma 12.41 with y = x = y^^^ and z = y*, we have 

+ ^J\y* - < Fiy*) + ^J\y* - 

< (1 - 9k)F{x^^^) + 9kF{x*) + ^\\y* - 

2ak 


< (1 - 9k)F{x^’^^) + 9kF{x*) + ^\\y* - {x^^^ + 

2ak 


Dk 











where the first inequality follows from the definition of y* and the convexity of F and the second 
one from ([8]). From (|21jl and the dehnition of y*, we obtain 


i7(3;(fc+i)) + 11(1 _ < (1 - 6k)Fix^'^^) + 9,Fix*) + 

2ak 


+ 


2,(y,k 


(1 - + 9kX* - + ^fc(l ^fc-i) (^(fc) _ 

V "k-1 ) 


Dk 


Rearranging terms, we have 


I7(a;(fc+1)) + ^||^(fc+i) _ x*\\%^ < (1 - ek)Fix^^^) + OkFix*) + 

ZOLk Zo^k 

Subtracting F{x*) from both sides leads to 

Vk+i + ^J\z^’^+^^-x*\\l^ < il-ek)Vk + ^J\z^'^^-X*\\l^ 

Multiplying both sides by 2ak/9‘^ and rearranging terms gives 

2S-»+i + - ^‘llk < 2at^vt + ||2<‘l - x-lli 


Dk 


q 2 ^fc+1 T 11^' ■ “ IlDfc ^ —^2—T 11^' ' “ IlDfc 


Finally, observing that ak+i < o-k, we obtain 


,«fc+i 


-Vk+l 


+ - x*fj^^ < 2ak^-^Vk + \\z^’'^ - x*\\l,^ 


(25) 

(26) 

□ 


In view of (j23() . we obtain 
An example of sequence {Ok} and corresponding {/3fc} satisfying (l2T]l - (l23]) is the following one 

1 k = Q 


Ok = 


kta 


/3fc = 


0 k = 0 
k>l 


(27) 


with a >2. Indeed, since Ok = condition (|M]) writes also as 


4-1 + 4 — 440 


Since (pT)) implies tk = for all fe > 0, a > 2 we have 


^2 ^2 _ (k-l + a)'^ ^ (k + a) (A: + a)^ 

^k—1 ' ^k — o “r o 


a 

(A: + o)(a — 2 ) + I 




> 0 




The choice a = 2 has been proposed in [3] for computing FISTA’s extrapolation parameters, 
while the more general case a > 2 is considered in |10] . 

Our aim is now to show that the sequence {u^} is bounded. To this end, we recall the 
following lemma on summable nonnegative sequences. 
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Lemma 3.2 Let {ak}, {Cfc} o,nd {^fc} be nonnegative sequences of real numbers such that 
ak+i < (1 + Cfc)«fc + f.k and < oo, YlT=o^k < oo. Then, {ofc} converges. 

In the next lemma, we introduce a crucial assumption on the sequence of matrices {Dk} (see 

also mm)- 


Lemma 3.3 Let {0^} satisfy (I23p and define = ‘2akt\_]Vk + Assume that the 

sequence of matrices {Dk} C satisfies 

OO 

Dk+i ^ (1 + r]k)Dk VA: > 0 with ^ 0 such that E rjk < oo (28) 

k=0 


Then, {a^} is a convergent sequence. 


Proof. Setting Sk as 
in view of ([28]) . we obtain 

Ofc+l = Sfe+l + 


‘^a>iktf^_iVk 


1 “ iiDk+1 


< 

< 

< 


Sk+1 + (1 + hfc) Wok 

(1 + hfc)('Sfc+i + 

(1 + 'nk){sk + 

(1 + rjk)ak 


(29) 


where the second inequality follows from the fact that rjk > 0 for any A: > 0 and the third one 
from inequality (j24p . Thus, Lemma 13.21 implies that {ofc} converges. □ 


We are now ready to give the main result of this section, establishing the convergence rate of 
Algorithm [TJ 


Theorem 3.1 Let {Df^} cVn be a sequence of matrices satisfying ([28P and assume that {6k}, 
{jdk} are chosen as in (j27[) with a >2. Then, there exists a constant C such that 

for all k > 1. 


Proof. Lemma 13.31 guarantees in particular that there exists a constant K > 0 such that 
Ok = 2akt‘l_^Vk + < K. Since 2akt\_.^Vk < a/c, we also have 2akt\_.^Vk < K. Formula 

(I27p implies that tf._^ = . Thus, recalling the definition of Vk and the lower bound in 

(fTOP for the parameter Uk, we can write 


Vk = F{x^'^^) - F{x*) < 


a^LK 

2r]6{k — 1 + a)2 


obtaining (1301) with C = □ 

Theorem 13.11 is a generalization of Theorem 4.4 in [3], which is recovered when = I, a = 2, 

Y = M”. An analogous result for the case Dk = I, a > 2 can be found in [ID] . 

It can be observed that the optimal value of the constant C in (I30p is obtained with a = 2. 
However, as pointed out in [10] and as we will see in the following section, selecting a > 2 allows 
to prove the convergence of the iterates to a solution of ([T]). 
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Remark When the sequence of matrices {D^} satisfies the assumption ()28p and, in addition, 
supfcgi^ll-Dfcll = 7 < oo, then there exists a matrix 3 G such that —)• P pointwise [T^ 

Lemma 2.3]. A similar result holds also for a sequence of matrices {D^} C such that 
supfcgjsf II Pfc II = 7 < oo, satisfying the following condition 

OO 

Dk ^ (1 + r'fc)Pfe+i VA: > 0 with G M, > 0 such that Vk < oo (31) 

k=0 

A sufficient condition ensuring both (I28p and (|3ip is the following one 

^ OO 

— < llA’fcll < 7 fe 7 fc = 1 + Cfc where Cfc > 0 and < oo (32) 

k=0 

with 7 fc < 7 , 7 > 1. Indeed, in this case 7=7 and for any x G M”' we have 

Tn ^ 7fc+l7fc|| ||2 ^ T r, 

X Dk+ix < - X <'yk+i'ykx DkX 

Ik 

Tn ^ 7A:7A;+1 II ||2 ^ Tn 

X DkX < -||x|| < 7 fc 7 fc+ix Dk+ix 

Ik+l 


Let us define r]k = Vk = 7fc+i7fc ~ 1 = ■\/(l + Cfc+i)(l + Cfc) “ 1 and observe that the series Vk 
and ^ Ck have the same behavior, since the known limit lim^_>.o(\/l A z — Djjz = 1/2. Therefore, 
for any x G M”’, we can write 

x^Dk+ix < (1 + rik)x^DkX 
x^DkX < (1 + Vk)x^Dk+ix 

with rjk = Vk for any A: > 0 and YhVk < oo] thus a sequence of matrices chosen according to the 
rule (1321) satishes both the assumptions (f28P and (fSTT) . 

For the sequence {Dk} satisfying (f3^ and {9k} as in (iTfl) with a = 2, the convergence rate 
estimate established in Theorem 13.11 becomes 

which is very similar to that in [3]. 

3.2 Convergence of the iterates to a minimizer 

In this section, borrowing the ideas in m, we prove that the iterates converge to a 

minimizer of F, when the parameters sequences are chosen as in (1271) with a > 2 and the scaling 
matrices sequence satisfies some additional assumption. Before giving the main result, we need 
to prove some technical lemmas. 

Under the same assumptions of Theorem 13.II we first prove the boundedness of the sequence 
{||tt^^^||}. To this end, we first recall an useful lemma on summable nonnegative sequences. 

Lemma 3.4 m Let { 7 fc} be a sequence of positive numbers such that 7 ^ = 1 + T]k, rjk > 0, where 
YlV=oVk < 00 . Let Tk = nj=o7i A: > 0. Then the sequence {xk} is bounded. 


II 







Lemma 3.5 Let {6k}, {Pk} be defined such that (1211) and ([231) hold and assume that the sequence 
{Dk} C Pr; satisfies (f28l) . Then, the sequence is bounded, i.e. < U for k>0. 

Recalling definition (I29h . we obtain 

< {I+ 

— 0-T rik){sk — Sk+i 

< (1 + %)('SA: + 

< (1 + Vk){sk + (1 + '>lk-l)\\u^^^ Wok-i) 

< (^ +'>lk){^ -k Vk-l) (^Sk + (^Sk-1 — Sk + \\u^^ 

< (1 + %)(1 + %-l)('Sfc-l + (1 + %-2)||'W^^ 

< (1 + ?/A:)(l + ??fc-l)(l + ?/A:-2)('Sfc-2 + ^^llDfe_2) 

< rfc(so + ll-w^^^lliio) 

where Tk is defined as in Lemma 13.41 and we repeatedly applied the inequalities ()28p and (I24p in 
the following ones, together with the fact that iji > 0 for z > 0. Since Tk is bounded, IId^+i 

is bounded. Furthermore, from Q we have the result. □ 

The following lemma generalizes the results in cni Theorem 2]. 

Lemma 3.6 Let {Dk} CVrj be a sequence of matrices satisfying ([281) . with sup^gp^ ll-^fcll = 7 < 
oo. Assume that {dk}, {fik} are chosen as in (f27l) . with a > 2. Then the sequence {kvk} is 
summable. 


Proof. First we recall that with these settings for 9k and fik, 
of = ^, we can write the inequality ([261) as follows: 


and ()2ip are satisfied. In view 


r^k+ibk^k+i r)ik(tf, tk)Vk ^ 


‘<‘>117 


A«)||k 


Summing up from k = 0 ,..., K, since to = 1; we have 

K 

aK+itKVK+i + ^ -tl + tk)vk < 


k=l 


< 


< 


-E 

2 ^ 

k=l 

1 ^ 


W||2, _||^(W||2 


Pk 


'Dk-: 


+ 


\U 


'“> 7 . 


\u 


(A+l )||2 

WDk 




(fc )||2 


Dk-i ~ 11 “ 


(fe )||2 


Dk-i + 




k=l 

K 




< 


2 

yC/' 




‘'"’lit. 


k=l 
2 K-1 


E% + 




I Do 


k=0 
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where the second inequality follows from (1281) and the last one from the boundedness of ||-Da;|| 
and Lemma 13.51 Furthermore, using (I19p . we obtain 


K 


-4 + *-k)vk < Xi ’"= + 




k=l 


26rj 


k=0 


Then, since is a summable sequence, in view of (1251) . + tk)vk} is nonnegative and 

summable. Finally, observing that for a > 2, we have 

n / .2 ,2 , . _ - 2) + (a - 1)^ 

u S Ifc-l —^k + ^k — -- 


we can conclude that also {kvk} is summable. 


□ 


The following lemma is a consequence of the previous one. It requires an additional condition 
on the scaling matrix sequence {Df^.}. Indeed we will assume that the sequence {%} in (1281) is 
given by 

{%} = O with p>2 (33) 

This assumption guarantees also that {/cr/fc} is summable. When {Dfc} is chosen according to 
(l32l) . the condition (p3]) is satished when Cfc = ^ for any positive scalar b and p > 2. 

Lemma 3.7 Let the assumption of Lemma 13.61 be fullfiUed with { 77 ^} = p > 2 in (l28t) . 

Then, setting 6^ = — x^^~^'^\\'jj^/2, the sequence {k5k} is summable. In addition, there 

exists D > 0 such that for all k >1, 5k < ^ ■ 

Proof. From (fTSl) with x = y = and z = x^^\ it follows that 






(fc) _ 




< + 


|x(^) — y^^'^\W 


Du 


2ak 2ak 

From Lemma l2.11 since G dom(( 7 ) C Y, we have 

Then, subtracting F{x*) from both sides of ()34p . we can write 

ll^(^) _ ^(fc+l )||2 


(34) 


Vk+l + 


||™( fc ) _ ™( fe — 1 ) ||2 
Dk ^ , o2 ''Dk 

< Vk + K — 


2oik ‘2*cx.k 

From ()28p we have 

4+1 = < (l + %)i||x("+i) -x^lli 

Then, since 77 ^ > 0, from (|35p . it follows that 

4+1 < (1 + 77fc)(afe(t’fc - i^fe+i) + /3fc4) 


(35) 


Dk 


(36) 
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Since 9k = and 13k = (|36]) writes also as 

(A; + a)^4+i - (1 + 77fc)(A; - 1)^4 < {1 +'nk)ak{k + a)‘^{vk - Vk+i) 

< ao{l + r]k){k + af{vk-Vk+i) (37) 

where the second inequality follows from (I19p . 

Since {%} = 0{-^) with p > 2, limk->-oo Vkk'^ = 0; therefore, given a scalar e > 0 such that 
a? — 2a > e, there exists an index I such that for any k > i we can write 

r]k{k — 1)^ < Tjkk'^ < e < a? — 2a (38) 

Summing up the inequality (l37)l ioi k = I, ...,K yields 

K 

[K + a)^5K+i + {{k — 1 + o)^ — (1 + r]k){k — 1)^)4 < (1 + — 1)^4 + (39) 

fc=£+l 

+q:o((^ + a)^(l + r]i)vi - {K + af{l + r]K))vK+i + 

K 

+ao ^ {{k + o)^(l + ryfc) - (A; - 1 + a)^(l + r]k-i))vk 
k=e+i 

For all k we have 

(A: + a)^(l+%) - (A; - 1 + a)^(l + %_i) = %(A: + a)^ + 2(A: + a) - 1 - (A: - 1 + a)^%_i 

^ Pkik 3~ o)^ + 2(A: + o) 

and, in view of (I38p . for k > i we also have 

(A: — 1 + a)^ — (1 + Vk){k — 1)^ = + 2ka — 2a — 7jk{k — 1)^ > 2ka > 0 

Then, ignoring negative terms on the right hand side, we obtain 

K 

{K + a)‘^6K+i + 2ka6k < (1 + %)(^ — 1)^4 + (40) 

+ao i {£ + a)^{1 + r]e)ve + ^ 2(A: + a)ufc + ??fc(A; + a)^?;^ 

V k=£+l 

Since pk{k + a)^ is bounded, by Lemma 13.61 the right hand side of the previous inequality 
is uniformly bounded independently on K. This ensures that k6k is summable. Furthermore 
K‘^6k+i is globally bounded. □ 

We are now able to prove the first, weak, convergence result about the sequence generated by 
Algorithm [1] as stated in the following Corollary. 

Corollary 3.1 Let the assumptions of Lemma |g. 7| be satisfied. Then, is bounded and 

any of its limit point is a solution of problem m- 
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Proof. A direct consequence Lemma 13.71 is that the sequence is bounded. 

From Lemma I.S.51 it follows that the sequence defined in (I22p is also bounded. These two 

facts imply that the sequence is bounded. 

Assume that x is a limit point of {x^^^}, i.e. there exists a subsequence {x^^^}k£ic of {x^^^} 
such that x^^) —>■ x, /c S /C as fe —)• oo. This element x of dom(g') is also a limit point of 
Indeed, from Lemma EH the definition of and the boundedness of jSk and {L>fc}, we have 
for any A; > 1 and, in particular for k ^ JC: 

|| y ( fc ) _^( fc )||2 < i || y ( fc ) _ a ,( A :)||2 < 2 ^ 2 ^^ < 2 ^^ 

7] r] r] 

Under the assumption of Lemma [3.71 we have <5^ —)• 0, as A: —oo, thus x is a limit point of 
From assumption (1281) we have that {D^} converges pointwise to some matrix (D G 
[la Lemma 2.3] and, since Ok G [6r]/L,ao] we can assume without loss of generality that Ok 
converges to some a > 0 as A: diverges, k ^ JC. Therefore, x is a fixed point of the operator pa,X) 
and, consequently, it is a minimizer of F. □ 

Before giving the main result stating that the whole sequence converges to a minimizer, we need 
to prove the following technical lemma, which holds when the matrices sequence {Dk} satisfies 
both (1^ and (IdTT) . 


Lemma 3.8 Let the assumption of Lemma \S.6\ be fullfilled and {D^} be a sequence of matrices 

satisfying both the conditions (I28]l and (I31jl . Then, denoting ^for k > 1 we 

have 

llx* — |P 

‘hfc+i — ^k< fdk{^k — ‘hfc-i) + 2/3fe(5fe(l + rjk) + 77fc(l + fik)^k + Pkr'k-i - z - — (41) 


Proof. Let x* be a solution of the problem ([T|). Using (1181) in Lemma 12.41 with y = 
X = y^^\ z = X*, we have 


+ ^||x* - < Fix*) + ^J\x* - 


Die 


< 


Fix*) + — ||x* - x(^)||k + + —(x(^) - - X*) 

2ak 2ak Ofc 

(42) 


where the last inequality follows from Lemma l2.11 since y^^'i = Py^Dki^^^'^ + l^kix^^'^ — x^^ ^^)). 
We observe that 

(x(^) - x^’^-^'^)^Dkix^^'^ - X*) = hx^^'^ - + hx^^^ - x*\\l)^ - - x*\\l)^ 


Using this equality in (I42p we obtain 


_p(x(fc+i)) + _||x* - x(^+i)||k < Fix*) + —\\x* - x(^)||k + + 

V ) 2a;;," ^ > 2afc" 2afc" 

^ - x*\\‘f. - -llx("-i) - x*l|2 


+ 


\ 2 


IId. 


Wd, 
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Then we have 


— ||x* - —\\x*-X- -ll^, - 

2ak" 2afc" “ 


< F{x*) - F{x^’^+^^) + + 

2afc 


+ -(hl“> -2^’llk - - ^'•IIdJ 

a/c 2 ^2 * 

Since F{x*) — < 0 and /3^ + /3fc < 2/3^, we obtain 


1 


^||x* - x^^+^'>\\l^ - i||x* - < Pki^Wx* - + 2f3A 

Multiplying the last inequality by (1 + %), from (|28|) . we obtain 


“ (1 + 'nk)^k < 

II 2 ;* _ ^{k—l) ||2 

^ Pk{^ + %)(‘^fc- 2 - F 2/3fc(l + rjk)6k 

_^(fc-i)||2 

= Pki^k ~ ^k-l) + PkVk^k + Pki^k-l ~ (1 + Vk) -^- -) + 2/3fc(l + T]k)^k 

Thus, in view of the assumption (I31D . since 


_ ™(fc-l)||2 


^fe-1 = 


Dk- 


- < (1 + I'k-l) 


\X 


_ ^(fc-l)||2 


Du, 


we obtain dUD- 

Now, as in [TO], we introduce the notation 


I -\- a 

Pj,k = 1 j > k 


□ 


(43) 


Since /3i = 0, f3i^k = 0 for /c > 1. Moreover, in [T0| it is proved that, for a > 2, the following 
inequality holds for j > 2 

oo ■ I c; 

('I'*) 

k=j 

This inequality is exploited in the proof of the following convergence theorem, whose line is very 
similar to that of Theorem 3 in m- This result requires that the sequence of matrices {Dk} 
satisfies both the assumptions (1^ and ([3T]) . where {rj^} and {u^} are 0{^) with p > 2. 


Theorem 3.2 Assume that {6^} and {/3k} o,re chosen as in dTH) with a > 2 and let {Dk} C Vrj 
be a sequence of matrices satisfying (l28|) and ([31]) with sup^gj^ ||Dfc|| = 7 < 00 , {%} = 0{-^) 
and {I'k} = with p > 2. Then, the sequence x^^'l converges to a minimizer of F. 

Proof. The proof follows [ini Theorem 3] and [23]. We first prove that — x*\\j^^/2 

converges. From Corollary 13.11 we have that {x^F'^ ig a bounded sequence. Then, there exists 
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||a:(^) —x* Ip 

a positive scalar M such that ^ 9,11 A: > 0. From the inequality (1411) . since 

supfcgi^ ||Dfc|| = 7 , we obtain 

4>fc+i - < /3fc(4>fc - 4>fc-i) + 2/3fc4(l + r]k) + ^{1 + /3khM + Pki'k-ijM (45) 

Now, defining pk = max(0, — 4>fc-i) and recalling that (5k < 1, we obtain 


Pk+i < l5kPk + 2/3fc4(l + r]k) + PklM{r]k + Vk-i) + iMrjk (46) 


By applying ()^ 6 ]l recursively and using (1431) and /3i = 0, it follows that 


k k k 

Pk+i < 2 ^ /3i,fc<5j(l + rjj) + yM ^ l3j,k{Vj + 

j =2 j =2 


3=2 

for all A: > 2. Hence, 
+00 

'^Pk+i 

k=2 


+00 k +00 k 

s ^EE I3j^k^j{l +T]j) + 7 M 

k= 2 j =2 k= 2 j =2 

+00 k +00 

+ 7 M EE (Sj^k'nj-i + iM ^ rik 

k= 2 j =2 k =2 

+00 +CX) +00 +00 

< 2 ^ (5j(l + rij) Pj,k + iM + Uj_i) ^ (5j^k + 

3=2 k=j j =2 k=j 

+ 00 +00 +CX) 

+ 7 M I3j,k + iM ^ rik 

j =2 k=j k =2 

+00 . _ +00 . 

< 2^5j(1 + 7?j):^—+7M^(7j+i/j_i);^—+ 

i=i i=i 


+ 00 


+ 7 M Vk 


+ CX) 


i=i 


fc=i 


where the last inequality follows form (|44p . From the assumption on { 7 ^} and {j4?j} and 

{ji 2 j} are summable; from Lemma 13.71 {j 6 j} is also summable. This implies that the right side 
of the last inequality is finite, therefore {pk} is summable. We set qk = ^k — J2i=i Pi and since 
> 0 and Pi is bounded, we have that Qk is bounded from below. On the other hand 


k k 

Qk+i = $fc+i -Pk+1 -'^Pi< ^k+1 - ^fc+1 + ^k - '^Pi = qk 

i=l i=l 

Therefore {qk} is a non-increasing sequence and it is convergent. This implies that = 
Sk + Pi is convergent. 

Assume now that x G dom(g') is a limit point of i.e. there exists a subsequence 

of {xi^i} such that limj^ooxi^'i = x. By Corollary 13.11 x is a minimizer of F. Thus, the first 
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part of the proof applies also to x and we can conclude that converges. As a 

consequence of this we have 

lim — x\\\) = lim — x|||) = 0 

fc—>-oo * i —>-00 

Since ri\\x^^'> ^ the last equality implies that the whole sequence converges 

to the minimizer x. □ 

As a consequence of the analysis performed in this section, we can conclude that when the 
sequence of matrices {D^} is chosen so that the condition (|32p holds with {Cfc} = O (^) with 
p > 2, and the extrapolation parameters and (5^ are defined as in ()27l) . Algorithm [ 1 ] generates a 
sequence {x^^^} convergent to a minimizer of F with a O (^) convergence rate for the objective 
function values. 

4 Numerical experiments 

In this section we present the results of several numerical experiments which aim at evaluating 
the effectiveness of the proposed scaled forward-backward method with extrapolation (SFBEM) 
by comparison with other state-of-the-art algorithms. The numerical experiments concern three 
different optimization problems which can be formalized as in ([ 1 ]) and arise from some relevant 
real-life applications. 

4.1 Image deblurring with Poisson noise 

We consider the inverse problem of recovering an unknown image xtrue from a given data cor¬ 
rupted by noise. Bayesian approaches suggest to address this problem by minimizing a functional 
which can be expressed as the sum of a discrepancy function, typically depending on the noise 
type affecting the data, and a regularization term adding a priori information and possible con¬ 
straints. In particular, in the case of Poisson noise, the discrepancy function measuring the 
distance from the data b G M"" is the generalized Kullback-Leibler (KL) divergence of the form 

^ ( h- I 

KL(x) = ^| 6 iln-^-^^^-^ + (Ax + 65 )i- 6 i| (47) 

where A G is a linear operator modeling the distortion due to the image acquisition system 

and bg G M”' is a known positive background radiation constant. A typical assumption for the 
matrix A is that it has nonnegative elements and each row and column has at least one positive 
entry. We refer the interested reader to [5] for a detailed survey on the image deblurring problem 
in presence of Poisson noise and the properties of the KL function (I47p . 

As for the regularization term, we consider a smooth discrete version of the total variation, 
also known in the literatnre as hypersurface potential (HS) fflElEI], that, for a square m x m 
image with = re, is defined as 


m _ 

HS(x) = yJ{{Vx)ij)l + 
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where V : 




is the discrete gradient operator with periodic boundary conditions 




{(T)x)ij)i 

{{Vx)ij)2 








^n+l,j — — ^ 2,1 • 


(48) 


In conclusion, a way to recover the true image from the corrupted data is to hnd a solution of 
the following optimization problem 


min F{x) = KL(x) + /9HS(x) + Lx>o (x), (49) 

xGIR" 

where p is a positive parameter balancing the role of the regularization term and Lx>o denotes 
the indicator function of the nonnegative orthant; indeed, the unknown (the pixels of the image) 
have to be nonnegative. 

Problem (j49h can be naturally cast in the form ([T]) by setting f{x) = KL(x) + /?HS(x) and 
g{x) = Lx>oix). 

In this case dom(/) = {x G M” : Ax + bg > 0} and V/ is Lipschitz continuous on y = {x G 
M"' : X > 0} = dom(g() C dom(/). However, only an estimation from above of the Lipschitz 
constants of both VKL and VHS is known (see m and [2T], respectively). 

The numerical tests have been performed by solving the optimization problem (I49h on two 
different datasets. The original images are the 128 x 128 micro m and the 256 x 256 Cameraman, 
both used in several papers and reported in the first row of Figure [H The values of the micro 
original image are in the range [0,69], while the values of the Cameraman lay in the interval 
[0,1000]. The corrupted data (Figured! second row) have been generated by convolving the 
objects with a suitable point spread function (the psf proposed in [30] for the micro image and a 
Gaussian psf, with standard deviation equal to 1.3, for the Cameraman one), adding a constant 
background equal to 1 and perturbing the result of the convolution with Poisson noise (simulated 
through the imnoise function of the Matlab Image Processing Toolbox). For both test problems 
we assume periodic boundary conditions, thus A is block-circulant with circulant blocks and the 
matrix vector products involving A can be performed via the Fast Fourier Transform. 

We chose the regularization parameter p equal to 0.09 for the hrst test problem and 0.045 
for the second one and the parameter 5 for the HS functional equal to 0.05 for both datasets. 
We compare the SFBEM approach with some other recent methods: 

• FISTA with backtracking [3|, which can be considered as a special case of SFBEM where 
the scaling matrix is chosen at each iterate as the identity matrix. Actually our imple¬ 
mentation slightly differs from the standard EISTA by the presence of the projection after 
the extrapolation step, which is needed when solving (j49p since dom(/) does not coincide 
with M”; 

• the scaled gradient projection method (SGP) [9l [3T], which is a well known algorithm 
for differentiable constrained optimization problems. The SGP iteration has the form 
dl, where the proximity operator reduces to the projection operator onto the constraints 
set. Here the selection of steplength parameter ctfc is based on the adaptive alternation 
of the Barzilai-Borwein rules proposed in [9] and the value of is computed by a line 
search procedure. We point out that there exists a more recent variant of SGP m which 
employs a different steplength selection rule. In order to avoid redundant results, we prefer 
to consider only the standard SGP approach as a comparative tool; 
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Micro Cameraman 


Figure 1: First row: original images for the two image deblurring test problems. Second row: 
blurred and noisy images for the two image deblurring test problems. 


• the nonscaled version of SGP, hereafter indicated by GP. 

The scaling matrix for SFBEM and SGP has been selected by exploiting the split gradient idea 
suggested in m and based on a decomposition of the gradient into a nonnegative part and a 
negative one. In particular, in m the authors show that the gradient of /(x) = KL(x) + / 9 HS(x) 
can be decomposed in the form 

-V/(x) = Ukl{x) + pUns{x) - Fkl(x) - pVHs(a:) 


with Ukl, Uhs > 0 and Vkl, hns > 0. Then a possible scaling matrix is given by 

\ \ \ -1 


Dk = diag I max f min f 7 ^, 


w 


{k) 




(50) 


where the quotient is componentwise and is equal to the previous iterate for SGP and 
to the extrapolated point for SFBEM. Moreover we set 7 ^ = 


'1 + 


1013 


p = 2.1. 


{k + l)P' 

We remark that in this case the projection at Step 1 of SFBEM is independent on the scaling 
matrix since Y is the nonnegative orthant, i.e. Py,d = Py- Thus we are allowed to first 
compute = Py{x^^'^ + /3fc(x*'^^ — then choosing depending on for updating 


Finally, the sequence {/3k} employed in the dehnition of the extrapolation step for both 
FISTA and SFBEM has been chosen as in (|27j) with a = 2.1 in order to ensure the convergence 
of the sequence of the iterates. 
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The performance of the algorithms has been compared by evaluating their ability in reduc¬ 
ing the objective function: in particular we computed an approximate solution x* of (l4^ by 
performing 20000 SGP iterations and, for any method, we consider at each iterate the relative 
difference between the objective function and the minimum value 


- F{x*) 
F{x*) 


(51) 


Table [T] and Table [2] report the number of iterations and the computational time needed by 
each method to reduce the relative difference m below a certain tolerance tol. Since in both 

test problems the solution x* is unique, the relative minimization error RME(3:*'^^) = 

is also reported. The computational time presented is the average execution time (in seconds) 

over ten runs. 

Figure [2] shows the decrease of the relative differences (j51l) with respect to both the iteration 
number and the computational time. We also observed that once the quantity (ISTI) is below 
the tolerance 10“^, all algorithms provide the same relative reconstruction error RRE(x^^^) = 


1, which measures the quality of the computed solution as an approximation of Xtrue- 
More precisely, this value is 0.088 for the micro test problem (RRE(6micro) = 0.195) and 0.087 
for the Cameraman one (RRE(6 Cameraman) = 0.121). 



It. 

tol = 10 
RME 

-3 

Time 

It. 

Micro 

tol = 10- 
RME 

5 

Time 

It. 

tol = 10 
RME 

-7 

Time 

CP 

585 

0.0414 

5.69 

2100 

0.0077 

21.31 

3459 

0.0014 

34.70 

SCP 

75 

0.0261 

0.72 

203 

0.0061 

2.14 

336 

0.0016 

4.48 

FISTA 

223 

0.0410 

4.12 

888 

0.0048 

15.26 

3298 

0.0005 

56.05 

SFBEM 

64 

0.0202 

0.84 

188 

0.0038 

2.17 

515 

0.0004 

6.72 


Table 1: Number of iterations and computational time required by each algorithm to reduce the 
relative difference (j51l) below given tolerances for the micro test problem. The corresponding 
RME and computational time (average over 10 runs) are also reported. 


Cameraman 

tol = 10-3 tol = 10-5 tol = 10-^ 



It. 

RME 

Time 

It. 

RME 

Time 

It. 

RME 

Time 

CP 

1730 

0.0134 

76.76 

4046 

0.0021 

164.07 

5637 

0.0003 

229.16 

SCP 

241 

0.0102 

9.37 

1178 

0.0014 

47.48 

1671 

0.0001 

76.24 

FISTA 

226 

0.0105 

13.13 

858 

0.0011 

54.46 

3332 

0.0001 

220.71 

SFBEM 

42 

0.0103 

2.90 

163 

0.0012 

10.31 

705 

0.0001 

48.41 


Table 2: Number of iterations and computational time required by each algorithm to reduce 
the relative difference (|5ip below given thresholds for the Cameraman test problem. The corre¬ 
sponding RME and computational time (average over 10 runs) are also reported. 
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Micro 


Cameraman 


Figure 2: Plots of the relative difference (I5ip with respect to the iterations number (top) and 
computational time (bottom) for the micro (left) and Cameraman (right) test problems. 


From the numerical results shown in Tables [Hand[2] and in Figure^ it is possible to conclude 
that the presence of a non trivial scaling matrix makes the performances of SFBEM always 
superior to those of the nonscaled FISTA approach in terms of both number of iterations and 
computational time, while providing the same RRE and RME. Moreover, the comparison with 
SGP also supports the effectiveness of SEBEM: indeed, the performances of SEBEM are as 
good as those provided by SGP which is known in the literature as one of the most competitive 
algorithms to deal with image deblurring problems. 

4.2 Compressed sensing with Poisson noise 

As a second benchmark framework, we consider a compressed sensing problem which consists 
in recovering a sparse vector of nonnegative values starting from noisy measurements. More in 
detail, we assume that the observed data b S M™' is the realization of a Poisson random variable 
with expected value given by Axtrue + bg, where Xtrue G 1^” is the signal of interest, A S 
is the measurement matrix and bg is a known background. As suggested in [28], the true signal 
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can be reconstructed by addressing a minimization problem of the form 


min KL(x) + p\\x\\i + i3;>o(3;) 


(52) 


where KL is the generalized Kullback-Leibler divergence (I47h . the (.i norm induces sparsity on 
the solution, p is the positive regularization parameter and Lx>o is the indicator function of the 
nonnegative orthant. 

In this case, we set f{x) = KL(x) and g{x) = p||x||i + Lx>q{x) and Y is the nonnegative 
orthant. The operator Pa d{x) associated to g{x) can be computed in closed form [T9l Section 
II]. 

The numerical experiments are carried out on a test problem which has been generated with 
the following steps: 


(i) a matrix A G j^ioooxsooo j^g^g generated as detailed in |28] so that A preserves both the 
positivity and the flux of any signal (i.e. if z > 0 then Az > 0 and 

(ii) the signal to recover Xtrue £ has all zeros except for 20 non-zero entries drawn 

uniformly in the interval [0,10®]; 

(iii) the observed signal h G has been obtained by corrupting the vector ^xtrue + bg 

{bg = 10“^®) by means of the Matlab imnoise function. 


The regularization parameter p has been fixed equal to 10 

We compare SFBEM, FISTA with backtracking and the SPIRAL method developed in m 
and designed to solve problems of the type (f5^ . The SPIRAL approach is a forward-backward 
algorithm that employs a steplength selection strategy based on the Barzilai-Borwein rules [2]; 
the convergence is guaranteed by a proper linesearch on the values of the objective function. 
The Matlab code of SPIRAL is available on-line m- The scaling matrix for SFBEM has 
been selected by exploiting the already mentioned decomposition idea of the gradient of the 
differentiable part of the objective function: in particular, by writing the gradient of the KL 
functional as 

-VKL(x) = [/kl(x)-1/kl(x) 
with Ukl > 0 and Vkl > 0, is dehned as 


Dk 


diag 


^max 



/ y{k) 


-1 


with = W1 


10 ® 


p = 2.1. The sequence {/3fc} used to update the extrapolation point 


{k + l)P' 

has been chosen as in (j27|) with a = 10. The considered methods have been stopped when the 
relative distance between two successive iterations is less than 10“^. Table [3] shows the perfor¬ 
mance of FISTA, SFBEM and SPIRAL in solving problem (j52p in terms of number of iterations 
and computational time (average over ten runs) to make the relative distance (I5ip smaller than 
prefixed thresholds. For this test problem, the minimum point x* has been computed by the 
SPIRAL method in 10000 iterations. Moreover, we report the relative reconstruction error and 
the relative minimization error. In order to better appreciate the resnlts. Figure [3] depicts the 
decreasing behavior of the objective function with respect to both the number of iterations and 
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Compressed sensing problem 

tol = 10“^ tol = 10“^ tol = 10“^ 



It. 

RME 

Time 

It. 

RME 

Time 

It. 

RME 

Time 

FISTA 

637 

0.0111 

15.75 

933 

0.0011 

23.62 

1412 

0.0001 

35.87 

SFBEM 

369 

0.0096 

8.77 

568 

0.0011 

14.37 

806 

0.0001 

20.04 

SPIRAL 

1180 

0.0120 

15.87 

1309 

0.0013 

17.33 

1379 

0.0001 

18.10 


Table 3: Number of iterations and computational time required by each algorithm to reduce 
the relative difference (I5ip below given tolerances for the compressed sensing test problem. The 
corresponding RME and computational time (average over 10 runs) are also reported. 




Figure 3: Compressed sensing problem: relative difference (|5ip with respect to the iterations 
number (left) and computational time (right). 


the computational time. All the considered algorithms yield to the same value of the RRE equal 
to 0.075. 

The results obtained on the compressed sensing problem confirm the same conclusions in the 
image deblurring framework: the benefit of applying the SFBEM instead of FISTA is evident 
from the significant reduction of the number of iterations and computational time, as reported 
in Table [3l 

4.3 Probability density estimation 

The last optimization problem we considered concerns the estimation of an unknown Gaus¬ 
sian mixture probability density. More in detail, if the sample {ri,r 2 , ...,Tn | r* G M} has been 
drawn from an unknown probability density function ^(r) which can be expressed as a Gaussian 
mixture, then a possible estimator has the form m 

n 

Kt) = '^XiKir{T,Ti) (53) 

i=l 

where Kq-P', Tj) is a Gaussian kernel with variance a and center r* and Xi is a suitable coefficient. 
In [T7] the authors proved that the weight vector x can be computed as a solution of the following 
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minimization problem 


min -x^ Cx — X + t^ + (x) 
xgK" 2 V ^ 


(54) 


where 


(a) the element Cij of the matrix C € 

^i,j ^ 2 (t )) 


^nxn -g Gaussian kernel of variance 2a, i.e. 
1 " 

(b) the i-th component of the vector p G M"' is defined as pi = — } Ka{Ti, Tj); 

n 


i=i 


(c) i^+ is the indicator function of the simplex = {x G M” | Xj > 0 Vi, Y^^=i = !}• 

In this case we set f{x) = ^x'^Cx —p^x, g{x) = i^+{x) and Y = M"'. Thus, the operator 
Pa,D{x) consists in the projection onto the simplex Aj*". Such projection can be formulated as a 


root-finding problem and effectively computed by the secant-like algorithm proposed in [16 
For the numerical experiments we analyzed the following Gaussian mixture 


1 ^ 

PiT) = T 


i=l 


with cj,- = 



2—1 


2—1 


and Cj = 14 


— 1 , i = 1, ...,5. The matrix C and the vector p 


r 

have been generated with a sample of 1000 points drawn from p by using the gmdistribution 
function of the Matlab Statistics and Machine Learning Toolbox. 

The effectiveness of SFBEM has been evaluated in a comparison with FISTA with back¬ 
tracking, SGP and GP. 

The scaling matrix for SFBEM and SGP has been selected in the form 

D, = diag (max (P min (7., ^) ) ) 


with = . 1 + 


IQio 


-, p = 2.1 and equal to for SEBEM and for SGP. This 


{k + l)P'' 

choice of the scaling matrix mimics the split gradient based scaling proposed in |1| for quadratic 
problems. The extrapolation parameters sequence {/3k} has been chosen as in (I27p with a = 2.1. 
Table m reports the number of iterations and the computational time (an average value over ten 
runs) needed by the four methods to ensure a sufficient decrease of the distances ([^ . where 
X* has been computed by means of 25000 EISTA iterations. GP and SGP do not succeed 
in satisfying the more restrictive threshold within the prefixed maximum number of iterations 
(25000). In EigurelHwe can appreciate the decrease of the objective function obtained by the 
considered algorithms and in Figure [5] the reconstruction of the probability density function 
(pdf) through the estimator ([53]) where for simplicity we assume cr = 1. 

The numerical experiments performed in the probability density estimation setting rein¬ 
force the validity of the proposed SFBEM scheme in comparison with the other state-of-the-art 
approaches we tested. 
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Probability density estimation problem 



tol = 

It. 

10“3 

Time 

tol = 
It. 

10"® 

Time 

tol = 

It. 

10"^ 

Time 

GP 

Ill 

0.51 

20479 

114.62 

- 

- 

SGP 

90 

0.98 

4305 

26.01 

- 

- 

FISTA 

54 

0.94 

2141 

16.54 

21885 

165.00 

SFBEM 

53 

0.67 

810 

6.46 

3883 

28.88 


Table 4: Number of iterations and computational time required by each algorithm to bring 
the relative difference (15 ip below given thresholds for the probability density estimation test 
problem. 



Figure 4: Relative difference between the objective function values provided by the 

different methods and the minimum value F{x*). 



T 


Figure 5: Density estimation results. 
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5 Conclusions 


In this paper we proposed a variable metric forward-backward method with extrapolation based 
on two fundamental ingredients: a symmetric and positive definite scaling matrix multiplying 
the gradient of the differentiable part of the objective function, possibly capturing useful features 
of the problem to handle, and an inertial step which employs the information of the two last 
iterations in order to compute the new one. A proper backtracking strategy ensuring a sufficient 
decrease of the objective function and suitable adaptive bounds on the scaling matrix allow 
to prove the convergence of the scheme to a minimizer of the considered problem. We also 
provided a convergence rate estimate which is similar to existing convergence rate results for 
nonscaled forward-backward algorithms with extrapolation. Numerical experiments, carried out 
on optimization problems of different nature, showed very promising results in comparison with 
other algorithms which have already gained a great popularity in the literature. Future work will 
be addressed to analyze the possibility of introducing an inexact solution of the minimization 
problem which characterizes the backward step. 
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